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Witzmann, Frank A,, and Junyu Li. Cutting-Edge 
Technology. II. Proteomics: core technologies and applica- 
tions in physiology. Am J Physiol Gastrointest Liver Physiol 
282: G735-G741, 2002. First published January 9, 2002; 
10. 1152/ajpgi.O051O.2O01.— Technologies for proteomics, 
e.g., studies examining the protein complement of the ge- 
nome, have been in development for over 20 years. More 
recently, proteomics has become formalized by combining 
techniques for large-scale protein separation with very pre- 
cise, high-fidelity approaches that analyze, identify, and 
characterize the separated proteins. These methods bring to 
reality the powerful scope of proteomics, enabling research- 
ers to investigate cellular function at the protein level and 
thus representing one of proteomics' most fitting applica-" 
tions. In this review, we take a brief and concise look at some 
of the current, physiologically relevant technologies that 
comprise proteomics and report specific applications in which 
proteomics has provided valuable biological insight. 

protein analysis; two-dimensional electrophoresis; mass 
spectrometry; isotope-coded affinity tags 



THE BROAD RANGE OF MOLECULAR mechaiiisms that gov- 
erns cellular function is largely administered via the 
structure and function of genetically encoded products, 
the proteins. Collectively, these gene products repre- 
sent the proteome, and their analysis has come to be 
known as proteomics. The actual number of function- 
ally unique protein types in the human proteome vari- 
ably expressed across assorted human cell types from 
the >30,000 available genes is estimated to be 100,000. 
With multiply-modified forms of each, that number 
could approach a million. This diversity is the result of 
widespread posttranscriptional processing of mRNA 
and CO- and posttranslational processes. Both of these 
lead to a fair degree of discordance between the open 
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reading frames predicting protein structure and the 
actual functional product. Consequently, a full under- 
standing of function, disease processes, and clinical 
intervention necessitates expression analysis at the 
protein level. Additionally, the range of fully functional 
protein abundance in a cell may reach nine orders of 
magnitude. Proteomics thus presents investigators 
with a daunting technological task, both in terms of 
protein identification and quantification. Although 
originally designated as a global approach to identify 
the entire proteome (34, 36), using two-dimensional 
(2D) electrophoretic (2DE), mass spectrometric, and 
bioinformatic techniques, proteomics has become a di- 
verse science that includes nearly all manner of sepa- 
ration, affinity purification, and protein chemistry 
components. 

Considerable effort has been and continues to be 
placed on removing the technical barriers that impede 
proteomic efforts. We now understand both the 
strengths and limitations of the "first generation" pro- 
teomics approaches capable of generating significant 
biological insight, yet generally providing narrow data 
(protein presence/absence, protein identification) for 
high and moderately abundant proteins (6). This real- 
ization has led to expanded development and imple- 
mentation of chromatographic separation techniques, 
improved mass spectrometry (MS), automation via ro- 
botics, and growth of multidimensional biomolecular 
datasets (e.g., posttranslational modifications, subcel- 
lular localization, protein interactions, protein abun- 
dance, and protein function). Further technological de- 
velopments will continue to drive forward the next 
generation of proteomic techniques and approaches. 
These developments have widened the scope of pro- 
teomics and have fueled the explosion of interest in 
this field. Two full issues of Trends in Biotechnology 
have addressed these developments in detail (4, 37), 
and readers are encouraged to consult them. This 
themes article highlights some of the technologies cen- 
tral to contemporary proteome analysis and provides 
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examples of how these have been applied to physiolog- 
ical questions. 

DIFFERENTIAL EXPRESSION PROTEOMICS 

2DE and MS. As mentioned above, proteomics orig- 
inated as a direct result of technical developments in 
2DE protein separation and MS instrumentation and 
the explosion of genome sequence information that 
generated protein sequence databases. Often referred 
to as "peptide mass fingerprinting," this first-genera- 
tion, or "blue-collar," proteomics approach is still the 
most commonly used proteomics strategy and the most 
practical and economical for academic laboratories. 

In 2DE, proteins are subjected to orthogonal separa- 
tion methods; the first based on protein charge via 
isoelectric focusing (lEF) and then by mass in sodium 
dodecyl sulfate PAGE. The relatively recent develop- 
ment of immobilized pH-gradient gel (IPG) strips to 
improve first-dimension lEF separations shows prom- 
ise, although gel-based lEF remains a useful tool for 
the patient and resourceful. The final product of 2DE 
separation is essentially an in-gel array of proteins, 
each assuming a coordinate position corresponding to 
the unique combination of isoelectric point (pi) and 
mass. Resulting 2D protein patterns are visualized by 
a nxxmber of methods: visible and/or fluorescent dyes, 
silver stains, or autoradiography. Typically, scanned 
gel images are analyzed by any of a number of ever- 



improving 2D gel analysis software packages. It is here 
that both the strengths and weakness of this approach 
become evident. Protein abundance comparisons (e.g., 
differential expression) are easily made, because dif- 
ferences in protein spot density are readily detectable 
and can be quantified robustly and compared statisti- 
cally. However, unless one conducts highly parallel 
2DE runs, gel-to-gel variation becomes problematic 
and image analysis an exercise in frustration. 

Despite the insightful design and implementation of 
parallel 2DE nearly 24 years ago (2, 3) and numerous 
examples of its utiUty in differential protein expression 
analyses across a large number of samples, surpris- 
ingly, this approach has not been used widely. Unlike 
trends in 2D gel analysis software that enable the 
concurrent analysis of hundreds of gel patterns per 
experiment, electrophoretic eqiaipment manufacturers 
have lagged behind. Although efforts have been made 
to address the technical necessity of highly parallel 
2DE by scaling the process up to 12 gels/run maximum 
(e.g., Bio-Rad, Amersham Biosciences, etc.), contempo- 
rary 2DE instrumentation still falls short of the scale 
necessary (> 20-24 2D gels/run). Figure 1 illustrates 
this point by presenting a montage of multiple patterns 
from a single 2DE experiment. Here, 36 individual 2D 
gels were run (20/run) analyzing 36 individual wells 
from six 6-well culture plates on which human kera- 
tinocytes were cultured. Parallel analysis of this type 
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Fig. 1. Example of gel-to-gel consistency across 36 individual gel patterns representing 36 individual human 
keratinocyte samples incubated in six 6-well plates and solubilized on the plates (medium removed). Samples were 
separated by 2-dimensional (2D) electrophoresis (2DE), 20 gels/run in the authors' laboratory; thus results from 2 
separate runs are shown. The first (top left) pattern is the reference pattern in PDQuest analysis, the next 12 
patterns are controls, and the remaining 24 are jet fuel exposed. Highly parallel 2DE is essential for inclusion in 
successful differential expression proteomics studies. 
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makes differential expression analysis robust and sim- 
plifies candidate protein selection. 

A recent development that addresses gel-to-gel vari- 
ability using 2DE incorporates sensitive fluorescent 
protein staining and can be achieved without extensive 
parallel 2DE instrumentation is 2D differential gel 
electrophoresis (DIGE) (32). By using replicate gels of 
pooled samples separated on single 2D gels from mul- 
tiple individual animals (control vs. treated in each gel) 
(31), this approach has been used and validated to 
determine quantitative protein differences in acet- 
aminophen toxicity. In this clever approach, control 
and treated samples are labeled with Cy3 or Cy5 dyes 
and mixed before application onto the same 2D gel. As 
a result, the same form of a given protein from each 
sample will migrate to the same position on the 2D gel. 
The relative abundance of each protein in each sample 
is then obtained by scanning the gel using excitation 
and emission wavelengths unique to each Gy dye. 

After 2DE separation and image analysis, any or all 
of the proteins in the gel pattern can be selected for 
identification. In large, 2D gel-based projects, the ulti- 
mate goal is to identify all resolved proteins. In less 
ambitious studies, only those proteins differentially 
expressed are of interest. Regardless of project scope, 
the general approach to protein identification is to cut 
the protein spot from the gel and digest it with a 
proteolytic enzyme such as trypsin. The resulting di- 
gest mixture is then analyzed by matrix-assisted laser 
desorption ionization (MALDI) time of flight MS (26). 
The measured and optimized monisotopic mass data 
are then compared with theoretically derived peptide 
mass databases, generated by applying specific enzy- 
matic cleavage rules to predicted/known protein se- 
quences. Whereas MALDI-based peptide mass finger- 
printing enables high-throughput, accurate, and 
sensitive mass detection and may result in a large 
percentage of convincing protein identifications, many 
are ambiguous and require confirmation. Frequently, 
some digested proteins remain completely unidenti- 
fied, despite yielding measurable peptides. 

For unambiguous identification of 2D separated pro- 
teins, "peptide sequence tag" data derived by MS/MS 
(27) (with either MALDI or electrospray ionization ion 
sources) can be compared with expressed sequence tag 
databases, ever-expanding sources of genomic/pro- 
teomic information representing a number of organ- 
isms. 

Because the dynamic range of protein expression in 
most whole cell or tissue lysates is huge and only the 
most abundant proteins from 2D gels can be analyzed 
(despite excellent mass spectrometer sensitivity), far 
too many proteins are overlooked. Furthermore, hydro- 
phobic proteins and those with very alkaline pi are 
poorly resolved on conventional 2D gels. Even in rarely 
employed very large format gels, no more than 10,000 
proteins have been analyzed on a single gel. To over- 
come the problems of sensitivity and scope, tissue/cell 
fractionation methods are used to enrich the sample 
gels with organellar proteins, and as a result, several 



organellar proteomes are being characterized (9, 15, 
19, 30). 

Narrow-range IPG strips in 7-, 11-, 17-, or 24-cm 
lengths can bracket the pH range of first-dimension 
separations extending to fairly alkahne pH. Using 
these strips represents another significant step in over- 
coming the limits of 2DE by significantly expanding 
the resolving power of otherwise broad-range lEF. Sep- 
arations achieved in narrow pi windows greatly in- 
crease resolution and provide access to proteins that 
are either undetectable or comigrate when separated 
using a broad-range pH gradient. This approach thus 
becomes integral to any attempt to analyze thousands 
of sample proteins, particularly for those low-abun- 
dance proteins that would otherwise remain undetec- 
ted. In this regard, "virtual" gel patterns, such as those 
described by Cordwell et al. (5), significantly increase 
the number of proteins resolved (Fig. 2) and represent 
an important technical development. Finally, signifi- 
cant improvements in protein-detection sensitivity 
have been achieved by incorporating fluorescent and 
MS-compatible silver stains (22, 24, 29). In combina- 
tion, these technical advances continue to make 2DE 
an important component of the proteomics arsenal. 

For example, 2DE, fluorescent staining (2D-DIGE), 
image analysis, and peptide mass fingerprinting were 
combined recently to analyze the cardiac mitochondrial 
proteome in murine creatine kinase (CK) double- 
knockouts (KG) (20). Proteomic analysis demonstrated 
that despite the absence of all isoforms of CKin the KG 
mice, the cardiac mitochondrial proteome was identical 
to wild types, and, more importantly, its consistency 
mirrored a lack of altered cardiac function. Similarly, 
this approach was used recently to study human bron- 
chial biopsy samples. Proteins from these samples, 
which correlated to the transformation of normal fibro- 
blasts to myofibroblasts, were quantified and identified 
during the remodeling processes observed in asthma 
(35). Large-scale protein database development for 
toxicologic/pharmacological applications is an ongo- 
ing enterprise in many industrial laboratories. This 
is typified by Fountoulakis' 2D gel-based mouse liver 
protein database (8) that lists hundreds of identified 
proteins screened against acetaminophen and cites 
numerous other databases designed for general tox- 
icologic screening applications. 

Isotope-coded affinity tags. An emerging approach 
that directly addresses the dynamic range and solubil- 
ity limitations of 2DE combines the separating power 
of liquid chromatography (LC) with the highly accurate 
and sensitive mass detection of tandem MS (13). Iso- 
tope-coded affinity tags (IGATs) are reagents contain- 
ing a cysteine-reactive group, a linker with either eight 
hydrogens (light) or deuteriums (heavy), and a biotin 
(affinity) moiety. As shown in Fig. 3, by reacting each 
with light or heavy ICAT, relative protein abundance 
comparisons between two different cell states can be 
made. The proteins from each sample are combined 
and proteolytically digested, tagged peptides are col- 
lected by affinity chromatography, peptides are ana- 
lyzed via LG-MS for relative quantitation of the iso- 
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CytosoUc Nuclear Secreted Outer Membrane Ribosomal Others 



Fig. 2. Schematic view of subproteome 
approach. Cells and tissues are ini- 
tially prefractionated into cellular 
compartments or relative protein solu- 
bilities. The samples are screened us- 
ing wide-range 2D gels to determine 
sample complexity and then, if neces- 
sary, passed through higher resolution 
2D to map low -abundance proteins, ef- 
ficiently separate overlapping proteins, 
and to map proteins to their cellular 
location. Modified from Ref. 5. 
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topes on identical peptides, and finally, peptides are 
analyzed by LC-MS/MS for protein identification. De- 
spite limitations of its own, ICAT technology is being 
improved (28) and has already proven its utility in 
functional studies. For example, characterization of 
analytically troublesome lipid-raft proteins has been 
simplified (33), the proteomic components of a complex 
cellular metabolic pathway have been studied in the 
context of their functional genomic elements (17), and 
proteins of subcellular microsomes have been identi- 
fied and quantified in differentiating human myeloid 
leukemia (HL-60) cells (14) using the ICAT approach. 
The latter investigation included a common addition to 



contemporary proteomic approaches, e.g., multidimen- 
sional chromatographic separation of complex peptide 
mixtures. In this case, sample complexity was reduced 
by subjecting isotopically labeled proteolytic peptide 
mixtures to cation-exchange chromatography, avidin- 
affinity chromatography and reversed-phase HPLC be- 
fore automated mass spectrometric characterization. 

Because an estimated 20% of the human proteome 
includes proteins lacking at least one cysteine residue, 
alternatives to cysteine-based labeling that can tag 
every protein are being developed. For instance, a 
clever variation of the ICAT approach has been devel- 
oped to identify and quantitate the extent of protein 



Fig. 3. Schematic representation of the isotope-coded af- 
finity tag method. The cysteine side chains in the complex 
mixtures of proteins from 2 different cell states are re- 
duced and alkylated using the day 0 (dO)-labeled tag for 
the proteins in 1 cell state and the day 8 (d8) form of the 
tag for the proteins in the second cell state. The 2 mix- 
tures are then combined and subjected to a proteolj^ic 
digestion. The resultant complex mixture of proteol3^ic 
peptides is purified with an avidin column to pull out only 
the subset of labeled peptides via the affinity tag. Quan- 
titation of differential expression is based on the relative 
abundance of the isotopes in the mass spectrometry spec- 
trum. Modified from Mosely MA, Trends Biochem Sci, 
Suppl 19, Sll, 2001. 
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phosphorylation using a phosphoprotein isotope-coded 
affinity tag (PhlAT) (12). The PhlAT methodology is 
similar to the ICAT approach in that it enables pro- 
teome-wide purification and quantitation of peptides 
containing specific types of residues; in this case phos- 
phopeptides. Its potential in the characterization of 
cellular signaling is promising. 

To both simplify complex peptide mixtures men- 
tioned above and to target specific subgroups of related 
proteins and selectively identify them, a strategy using 
"signature peptides" (10, 18) isolated by an array of 
elaborate yet automated affinity and chromatography 
separations is being developed to study global phospho- 
and glycoprotein expression (11, 25). These multidi- 
mensional separation approaches represent an effec- 
tive trend in proteomics (independent of ICATs), and 
their implementation should prove particularly useful 
in physiologically directed differential expression and 
qualitative proteomics studies. 

FUNCTIONAL AND STRUCTURAL PROTEOMICS 

Many protein-mediated cellular functions are man- 
aged and regulated by mechanisms that do not involve 
quantitative changes in expression. Instead, they are 
the consequences of qualitative modification of existing 
proteins, chemical additions such as phosphorylation, 
glycosylation, and lipidation, or modifications such as 
oxidation and deamidation. Second, proteins that me- 
diate most cellular processes function as constituents 
of macromolecular complexes, not as individual enti- 
ties acting independently. The proteomic approaches 
described thus far are rather reductionist. Clearly, 
these are useful ways in which to study the proteome; 
however, to effectively study function, investigators 
also must focus on protein-protein interactions and the 
characterization of multimeric protein complexes. In 
this respect, two areas in which proteomics is playing a 
significant role in our understanding of cellular func- 
tion are the characterization of posttranslational mod- 
ifications (functional proteomics) and protein interac- 
tions (structural proteomics). 

The phosphoproteome. Given the importance of pro- 
tein phosphorylation in the regulatory activities of 
cellular function and the amplification of signaling 
cascades that distinguish these activities from others, 
it is not surprising that phosphorylation is the most 
common covalent protein modification in mammalian 
cells. Indeed, the huge number of protein kinases and 
phosphatases encoded by the genome underscore their 
significance. Global analysis of the phosphoproteome 
has thus evolved into an integral facet of physiology. 
Historically, phosphoproteins were studied on Western 
blots using antiphosphoserine or antiphosphothreo- 
nine antibodies. This approach is still adequate quali- 
tatively but not quantitatively, because it suffers from 
the same general limitations of 2DE mentioned earlier. 
In-gel digestion and phosphopeptide analysis are 
deemed feasible but impractical (21). As alternatives, 
recent analj^ical approaches to the phosphoproteome 
incorporate either phosphopeptide enrichment using 



metal affinity columns, phosphatase treatment before 
MS/MS, or the use of protein chips (39). These ap- 
proaches are necessitated by the low stoichiometry of 
protein phosphorylation, the fact that phosphopeptides 
are generally detected with low efficiency or not at all 
by MS. Also, the hydrophilic phosphopeptides may be 
eluted and therefore lost in the void volume during 
reversed-phase peptide cleanup for MALDI. 

New methods are in use that combine chemical mod- 
ification and affinity purification for the characteriza- 
tion of serine and threonine phosphopeptides (1, 23). 
These methods are generally based on the chemical 
replacement of the phosphate moieties by affinity tags 
(biotinylation) followed by trypsin digestion. The bio- 
tinylated peptides are then enriched by affinity-isola- 
tion, analyzed by LC-MS/MS, and the phosphorylated 
residues are identified by automated database search- 
ing. This approach has widespread potential utility for 
defining signaling pathways and control mechanisms 
that involve phosphorylation or dephosphorylation of 
serine/threonine residues. 

In a related development, Snyder and his colleagues 
(40) have engineered a novel approach for high through- 
put screening of protein kinase (PK) activities by over- 
producing all the yeast PKs ais glutathione S-transferase 
fiisions and covalently affixing them to a chip smface in 
microarray format. 

With the use of [^^P]ATP, it was discovered that par- 
ticular proteins are preferred substrates for particular 
PKs and that mainy PKs prefer particular substrates. 
This approach has enormous potential application in the 
study of mammalian and hmnan PK systems. 

Protein-protein interactions. As Eisenberg et al. (7) 
has so aptly proposed, "a protein is defined as an 
element in the network of its interactions," and, as 
such, each protein in living cells functions as part of an 
extended web of interacting molecules. In this regard, 
a more holistic (as opposed to global) analysis of the 
proteome incorporates ingenious approaches that in- 
volve 1) affinity purification of protein complexes, the 
electrophoretic separation of the components, their 
tryptic digestion, and the identification of each element 
(16) or 2) centrifugation purification of cell compo- 
nents, tryptic digestion of the protein constituents fol- 
lowed by multidimensional liquid chromatography and 
tandem mass spectrometric identification (38). 

As an example of the first approach, Blackstock's 
group (16) isolated the mouse brain A^-m ethyl -D-aspar- 
tate (NMDA) receptor multiprotein complex (NRC) 
and, by analyzing its components, provided informa- 
tion that strongly suggests that subsets of neurotrans- 
mitter receptors, cell-adhesion proteins, adapters, sec- 
ond messengers, and cj^toskeletal proteins are all 
organized together into a physical unit comprising the 
signaling pathway. Furthermore, several novel fea- 
tures of the NRC observed in this study provide valu- 
able insight into the physiological context of NMDA 
receptor-dependent synaptic plasticity. 

In the second approach, over 100 proteins can be 
analyzed per run via direct analysis of large protein 
complexes. Applied to the eukaryotic ribosomal pro- 
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teome, its constituent complex of —80 unique proteins 
is rapidly and sensitively characterized, and unique 
features are identified. This process demonstrates con- 
siderable potential in characterizing, as well as detect- 
ing alterations in, other functionally relevant protein 
complexes in a variety of cell systems. 

In summary, the various cellular proteomes are dy- 
namic, and fluctuations in their characteristic expres- 
sion are central to their role in physiological regula- 
tion, disease and injury, and their response to chemical 
intervention. Without a doubt, it is therefore essential 
that we conduct both broad and directed analyses of 
the proteome's individual protein components to un- 
derstand the molecular underpinnings of physiological 
function. We must work to make certain the technolo- 
gies supporting such analyses continue to improve, in 
turn, to ensure that the boimdaries to our understand- 
ing disappear as a result. Despite the limitations of 
current proteomics technology, there exist a number of 
approaches from which to choose, specific for each 
application. Whether one is interested in the differen- 
tial expression of a protein or group of proteins that 
underlie functional alterations, posttranslational mod- 
ification of resident proteins, or the complex constitu- 
ency and function of huge multiprotein complexes, the 
tools are available, and they are improving. 

This review has presented a limited sample of the 
many proteomic approaches and technologies relevant 
to the physiologist. A cursory look at the published 
literature by the reader will quickly demonstrate the 
utility of this approach in life science and the breadth 
in which its analytical power has been and will con- 
tinue to be applied. 
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